Abstract: Many private and/or public organizations have been reported to Create and monitor targeted Twitter streams to collect and understand users ‘opinions about the organizations. Targeted Twitter stream Is usually constructed by filtering tweets with user de?ned selection criteria (e.g., tweets published by users from as elected region, or tweets that match one or more prede?ned keywords Targeted Twitter Stream is then monitored to collect and understand users’ opinions about the organizations. There are an emerging need for early crisis detection and response with such target stream. Such applications require good named entity recognition (NER) system for Twitter, which is able to automatically discover emerging named entities that are potentially linked to the crisis. In this paper, we present an over-step unsupervised NER system for targeted Twitter stream, called Novel-NER. In the ?rst step, it leverages on the global context obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a gregarious property, due to the way the targeted stream is constructed. In the second step, Novel-NER constructs a random walk model to exploit the gregarious property in the local context derived from the Twitter stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated Novel-NER on two sets of real life tweets simulating two targeted streams. Evaluated using labeled ground truth, Novel-NER achieves comparable performance as with conventional approaches in both streams. Various settings of Novel-NER have also been examined to verify our global context +local context combo idea. As well as we are using the Wikipedia& Microsoft N gram with Association & Correlation.
Keywords: Novel-NER constructs a random walk model, global context +local context combo idea